Could We Automatically Reproduce Semantic Relations of an Information Retrieval Thesaurus?
نویسنده
چکیده
A well constructed thesaurus is recognized as a valuable source of semantic information for various applications, especially for Information Retrieval. The main hindrances to using thesaurus-oriented approaches are the high complexity and cost of manual thesauri creation. This paper addresses the problem of automatic thesaurus construction, namely we study the quality of automatically extracted semantic relations as compared with the semantic relations of a manually crafted thesaurus. The vector-space model based on syntactic contexts was used to reproduce relations between the terms of a manually constructed thesaurus. We propose a simple algorithm for representing both single word and multiword terms in the distributional space of syntactic contexts. Furthermore, we propose a method for evaluation quality of the extracted relations. Our experiments show significant difference between the automatically and manually constructed relations: while many of the automatically generated relations are relevant, just a small part of them could be found in the original thesaurus.
منابع مشابه
Extraction de termes, reconnaissance et labellisation de relations dans un thésaurus
Within the documentary system domain, the integration of thesauri for indexing and retrieval information steps is usual. In libraries, documents own rich descriptive information made by librarians, under descriptive notice based on Rameau thesaurus. We exploit two kinds of information in order to create a first semantic structure. A step of conceptualization allows us to define the various modu...
متن کاملIdentifying Semantic Relations in Text for Information Retrieval and Information Extraction
Automatic identification of semantic relations in text is a difficult problem, but is important for many applications. It has been used for relation matching in information retrieval to retrieve documents that contain not only the concepts but also the relations between concepts specified in the user’s query. It is an integral part of information extraction—extracting from natural language text...
متن کاملOntology-Based Word Sense Disambiguation by Using Semi-Automatically Constructed Ontology
This paper describes a method for disambiguating word senses by using semi-automatically constructed ontology. The ontology stores rich semantic constraints among 1,110 concepts, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In order to acquire a reasonably practical ontology in limited time and wi...
متن کاملAutomatic Detection of Thesaurus relations for Information Retrieval Applications
Is it possible to discover semantic term relations useful for thesauri without any semantic information? Yes, it is. A recent approach for automatic thesaurus construction is based on explicit linguistic knowledge , i.e. a domain independent parser without any semantic component, and implicit linguistic knowledge contained in large amounts of real world texts. Such texts include implicitly the ...
متن کاملSemi-Automatic Practical Ontology Construction by Using a Thesaurus, Computational Dictionaries, and Large Corpora
This paper presents the semi-automatic construction method of a practical ontology by using various resources. In order to acquire a reasonably practical ontology in a limited time and with less manpower, we extend the Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained ...
متن کامل